Data Report — Chronic Kidney Disease

Source: UCI dataset 336

SemMap JSON-LD: dataset.semmap.json · RDFa HTML

Overview

Metric Value
Dataset Chronic Kidney Disease
Source UCI dataset 336
Rows 158
Columns 25
Discrete 13
Continuous 12
SemMap SemMap JSON-LD
SemMap HTML
Missingness Not modeled

Variables and summary

variable inferred dist
age continuous 49.5633 ± 15.5122 [6, 39.25, 50.5, 60, 83]
bp continuous 74.0506 ± 11.1754 [50, 60, 80, 80, 110]
sg continuous 1.0199 ± 0.0055 [1.005, 1.02, 1.02, 1.025, 1.025]
al discrete Albumin negative [0]: 116 (73.42%)
Albumin 2+ [3]: 15 (9.49%)
Albumin 3+ [4]: 15 (9.49%)
Albumin 1+ [2]: 9 (5.70%)
Trace albumin [1]: 3 (1.90%)
Albumin 4+ [5]: 0 (0.00%)
su discrete Sugar negative [0]: 140 (88.61%)
Trace sugar [1]: 6 (3.80%)
Sugar 1+ [2]: 6 (3.80%)
Sugar 2+ [3]: 3 (1.90%)
Sugar 3+ [4]: 2 (1.27%)
Sugar 4+ [5]: 1 (0.63%)
rbc discrete Normal [normal]: 140 (88.61%)
pc discrete Normal [normal]: 129 (81.65%)
pcc discrete Not present [notpresent]: 144 (91.14%)
ba discrete Not present [notpresent]: 146 (92.41%)
bgr continuous 131.3418 ± 64.9398 [70, 97, 115.5, 131.75, 490]
bu continuous 52.5759 ± 47.3954 [10, 26, 39.5, 49.75, 309]
sc continuous 2.1886 ± 3.0776 [0.4, 0.7, 1.1, 1.6, 15.2]
sod continuous 138.8481 ± 7.4894 [111, 135, 139, 144, 150]
pot continuous 4.6367 ± 3.4764 [2.5, 3.7, 4.5, 4.9, 47]
hemo continuous 13.6873 ± 2.8822 [3.1, 12.6, 14.25, 15.775, 17.8]
pcv continuous 41.9177 ± 9.1052 [9, 37.5, 44, 48, 54]
wbcc continuous 8475.9494 ± 3126.8802 [3800, 6525, 7800, 9775, 26400]
rbcc continuous 4.8918 ± 1.0194 [2.1, 4.5, 4.95, 5.6, 8]
htn discrete Yes [yes]: 34 (21.52%)
dm discrete No [no]: 130 (82.28%)
Yes [yes]: 28 (17.72%)
No [no]: 0 (0.00%)
cad discrete Yes [yes]: 11 (6.96%)
appet discrete Good appetite [good]: 139 (87.97%)
pe discrete Yes [yes]: 20 (12.66%)
ane discrete Yes [yes]: 16 (10.13%)
class discrete Not chronic kidney disease [notckd]: 115 (72.78%)
Chronic kidney disease [ckd]: 43 (27.22%)
Chronic kidney disease [ckd]: 0 (0.00%)

Fidelity summary

umap model backend disc jsd mean disc jsd median cont ks mean cont w1 mean downstream sign match
metasyn metasyn 0.0646 0.0514 0.216 56.8532 0.5263
clg_mi2 pybnesian 0.0744 0.0752 0.1996 62.539
semi_mi5 pybnesian 0.0744 0.0752 0.1996 62.539
ctgan_fast synthcity 0.1733 0.1798 0.7529 1042.22
tvae_quick synthcity 0.1744 0.193 0.2715 86.9495

Privacy summary

model backend n real n synth exact overlap rate near duplicate rate eps nn distance mean k min k pct lt5 k map rare qi reproduction rate identifiability score delta presence
metasyn metasyn 158 400 0 0.9937 0.0165 1 1 1 0 2.5714
clg_mi2 pybnesian 158 400 0 0.9873 0.0271 1 1 1 0 3.5
semi_mi5 pybnesian 158 400 0 0.9873 0.0271 1 1 1 0 3.5
ctgan_fast synthcity 158 256 0 0.3987 0.3484 1 1 1 0 28.6
tvae_quick synthcity 158 256 0 0.9873 0.0241 1 1 1 0 11

Models

UMAPDetailsStructure

Real data

Model: metasyn (metasyn)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.1125 2.3701
bp continuous 0.3 3.9054
sg continuous 0.4062 0.0019
al discrete 0.094
su discrete 0.1747
rbc discrete 0.0228
pc discrete 0.0535
pcc discrete 0.0514
ba discrete 0.1342
bgr continuous 0.2425 23.229
Downstream metrics
metric value
sign_match_rate 0.5263
formula col_class ~ Q('age') + Q('bp') + Q('sg') + Q('al') + Q('su') + C(Q('rbc'), levels=['normal', 'abnormal']) + C(Q('pc'), levels=['normal', 'abnormal']) + C(Q('pcc'), levels=['present', 'notpresent']) + C(Q('ba'), levels=['present', 'notpresent']) + Q('bgr') + Q('bu') + Q('sc') + Q('sod') + Q('pot') + Q('hemo') + Q('pcv') + Q('wbcc') + Q('rbcc') + C(Q('htn'), levels=['yes', 'no']) + C(Q('dm'), levels=['yes', 'no']) + C(Q('cad'), levels=['yes', 'no']) + C(Q('appet'), levels=['good', 'poor']) + C(Q('pe'), levels=['yes', 'no']) + C(Q('ane'), levels=['yes', 'no']) + Q('age'):Q('bp') + Q('bp'):Q('sg') + Q('sg'):Q('al') + Q('al'):Q('su') + Q('su'):C(Q('rbc'), levels=['normal', 'abnormal'])
Privacy metrics
metric value
n_real 158
n_synth 400
exact_overlap_rate 0
near_duplicate_rate_eps 0.9937
nn_distance_mean 0.0165
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 2.5714
variable distribution
age core.normal
bp core.lognormal
sg core.truncated_normal
al core.multinoulli
su core.multinoulli
rbc core.multinoulli
pc core.multinoulli
pcc core.multinoulli
ba core.multinoulli
bgr core.truncated_normal
bu core.lognormal
sc core.lognormal
sod core.truncated_normal
pot core.normal
hemo core.truncated_normal
pcv core.truncated_normal
wbcc core.lognormal
rbcc core.normal
htn core.multinoulli
dm core.multinoulli
cad core.multinoulli
appet core.multinoulli
pe core.multinoulli
ane core.multinoulli
class core.multinoulli

Model: clg_mi2 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.1275 2.6951
bp continuous 0.3025 3.9078
sg continuous 0.3162 0.0019
al discrete 0.0925
su discrete 0.202
rbc discrete 0.016
pc discrete 0.095
pcc discrete 0.0585
ba discrete 0.0919
bgr continuous 0.12 12.7667
Privacy metrics
metric value
n_real 158
n_synth 400
exact_overlap_rate 0
near_duplicate_rate_eps 0.9873
nn_distance_mean 0.0271
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 3.5

Model: semi_mi5 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.1275 2.6951
bp continuous 0.3025 3.9078
sg continuous 0.3162 0.0019
al discrete 0.0925
su discrete 0.202
rbc discrete 0.016
pc discrete 0.095
pcc discrete 0.0585
ba discrete 0.0919
bgr continuous 0.12 12.7667
Privacy metrics
metric value
n_real 158
n_synth 400
exact_overlap_rate 0
near_duplicate_rate_eps 0.9873
nn_distance_mean 0.0271
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 3.5

Model: ctgan_fast (synthcity)

Per-variable fidelity
variable type KS W1 JSD
age continuous 1 34.0312
bp continuous 0.1016 2.5
sg continuous 0.2461 0.0028
al discrete 0.3647
su discrete 0.2342
rbc discrete 0.1798
pc discrete 0.2748
pcc discrete 0.073
ba discrete 0.1057
bgr continuous 0.9062 178.988
Privacy metrics
metric value
n_real 158
n_synth 256
exact_overlap_rate 0
near_duplicate_rate_eps 0.3987
nn_distance_mean 0.3484
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 28.6

Model: tvae_quick (synthcity)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.1562 4.4251
bp continuous 0.1328 2.7344
sg continuous 0.168 0.0018
al discrete 0.2962
su discrete 0.175
rbc discrete 0.1031
pc discrete 0.2671
pcc discrete 0.073
ba discrete 0.193
bgr continuous 0.1992 13.2205
Privacy metrics
metric value
n_real 158
n_synth 256
exact_overlap_rate 0
near_duplicate_rate_eps 0.9873
nn_distance_mean 0.0241
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 11